145 research outputs found
Enabling Robots to Communicate their Objectives
The overarching goal of this work is to efficiently enable end-users to
correctly anticipate a robot's behavior in novel situations. Since a robot's
behavior is often a direct result of its underlying objective function, our
insight is that end-users need to have an accurate mental model of this
objective function in order to understand and predict what the robot will do.
While people naturally develop such a mental model over time through observing
the robot act, this familiarization process may be lengthy. Our approach
reduces this time by having the robot model how people infer objectives from
observed behavior, and then it selects those behaviors that are maximally
informative. The problem of computing a posterior over objectives from observed
behavior is known as Inverse Reinforcement Learning (IRL), and has been applied
to robots learning human objectives. We consider the problem where the roles of
human and robot are swapped. Our main contribution is to recognize that unlike
robots, humans will not be exact in their IRL inference. We thus introduce two
factors to define candidate approximate-inference models for human learning in
this setting, and analyze them in a user study in the autonomous driving
domain. We show that certain approximate-inference models lead to the robot
generating example behaviors that better enable users to anticipate what it
will do in novel situations. Our results also suggest, however, that additional
research is needed in modeling how humans extrapolate from examples of robot
behavior.Comment: RSS 201
Coherent Soft Imitation Learning
Imitation learning methods seek to learn from an expert either through
behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL)
of the reward. Such methods enable agents to learn complex tasks from humans
that are difficult to capture with hand-designed reward functions. Choosing BC
or IRL for imitation depends on the quality and state-action coverage of the
demonstrations, as well as additional access to the Markov decision process.
Hybrid strategies that combine BC and IRL are not common, as initial policy
optimization against inaccurate rewards diminishes the benefit of pretraining
the policy with BC. This work derives an imitation method that captures the
strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement
learning setting, we show that the behaviour-cloned policy can be used as both
a shaped reward and a critic hypothesis space by inverting the regularized
policy update. This coherency facilities fine-tuning cloned policies using the
reward estimate and additional interactions with the environment. This approach
conveniently achieves imitation learning through initial behaviour cloning,
followed by refinement via RL with online or offline data sources. The
simplicity of the approach enables graceful scaling to high-dimensional and
vision-based tasks, with stable learning and minimal hyperparameter tuning, in
contrast to adversarial approaches.Comment: 51 pages, 47 figures. DeepMind internship repor
On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
Many advances that have improved the robustness and efficiency of deep
reinforcement learning (RL) algorithms can, in one way or another, be
understood as introducing additional objectives, or constraints, in the policy
optimization step. This includes ideas as far ranging as exploration bonuses,
entropy regularization, and regularization toward teachers or data priors when
learning from experts or in offline RL. Often, task reward and auxiliary
objectives are in conflict with each other and it is therefore natural to treat
these examples as instances of multi-objective (MO) optimization problems. We
study the principles underlying MORL and introduce a new algorithm,
Distillation of a Mixture of Experts (DiME), that is intuitive and
scale-invariant under some conditions. We highlight its strengths on standard
MO benchmark problems and consider case studies in which we recast offline RL
and learning from experts as MO problems. This leads to a natural algorithmic
formulation that sheds light on the connection between existing approaches. For
offline RL, we use the MO perspective to derive a simple algorithm, that
optimizes for the standard RL objective plus a behavioral cloning term. This
outperforms state-of-the-art on two established offline RL benchmarks
CpG Island microarray probe sequences derived from a physical library are representative of CpG Islands annotated on the human genome
An effective tool for the global analysis of both DNA methylation status and protein–chromatin interactions is a microarray constructed with sequences containing regulatory elements. One type of array suited for this purpose takes advantage of the strong association between CpG Islands (CGIs) and gene regulatory regions. We have obtained 20 736 clones from a CGI Library and used these to construct CGI arrays. The utility of this library requires proper annotation and assessment of the clones, including CpG content, genomic origin and proximity to neighboring genes. Alignment of clone sequences to the human genome (UCSC hg17) identified 9595 distinct genomic loci; 64% were defined by a single clone while the remaining 36% were represented by multiple, redundant clones. Approximately 68% of the loci were located near a transcription start site. The distribution of these loci covered all 23 chromosomes, with 63% overlapping a bioinformatically identified CGI. The high representation of genomic CGI in this rich collection of clones supports the utilization of microarrays produced with this library for the study of global epigenetic mechanisms and protein–chromatin interactions. A browsable database is available on-line to facilitate exploration of the CGIs in this library and their association with annotated genes or promoter elements
- …